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Abstract 

An equation describing the evolution of phenotypic distribution is derived using methods de- 
veloped in statistical physics. The equation is solved by using the singular perturbation method, 
and assuming that the number of bases in the genetic sequence is large. Applying the equation to 
the mutation-selection model by Eigen provides the critical mutation rate for the error catastro- 
phe. Phenotypic fluctuation of clones (individuals sharing the same gene) is introduced into this 
evolution equation. With this formalism, it is found that the critical mutation rate is sometimes 
increased by the phenotypic fluctuations, i.e., noise can enhance robustness of a fitted state to 
mutation. Our formalism is systematic and general, while approximations to derive more tractable 
evolution equations are also discussed. 
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I. INTRODUCTION 



For decades, quantitative studies of evolution in laboratories have used bacteria and 
other microorganisms [l|, a a. Changes in phenotypes, such as enzyme activity and gene 
expressions introduced by mutations in genes, are measured along with the changes in their 
population distribution in phenotypes 

mm 

Following such experimental advances, 
it is important to analyze the evolution equation of population distribution of concerned 
genotypes and phenotypes. 

In general, fitness for reproduction is given by a phenotype, not directly by a genetic 
sequence. Here, we consider evolution in a fixed environment, so that the fitness is given as 
a fixed function of the phenotype. A phenotype is determined by mapping a genetic sequence. 
This phenotype is typically represented by a continuous (scalar) variable, such as enzyme 
activity, protein abundances, and body size. For studying the evolution of a phenotype, it is 
essential to establish a description of the distribution function for a continuous phenotypic 
variable, where the fitness for survival, given as a function of such a continuous variable, 
determines population distribution changes over generations. 

However, since a gene is originally encoded on a base sequence (such as AGCTGCTT 
in DNA), it is represented by a symbol sequence of a large number of discrete elements. 
Mutation in a sequence is not originally represented by a continuous change. Since the 
fitness is given as a function of phenotype, we need to map base sequences of a large number 
of elements onto a continuous phenotypic variable x, where the fitness is represented as a 
function of x, instead of the base sequence itself. A theoretical technique and careful analysis 
are needed to project a discrete symbol sequence onto a continuous variable. 

Mutation in a nucleotide sequence is random, and is represented by a stochastic process. 
Thus, a method of deriving a diffusion equation from a random walk is often applied. How- 
ever, the selection process depends on the phenotype. If a phenotype is given as a function 
of a sequence, the fitness is represented by a continuous variable mapped from a base se- 
quence. Since the population changes through the selection of fitness, the distribution of 
the phenotype changes accordingly. If the mapping to the phenotype variable is represented 
properly, the evolutionary process will be described by the dynamics of the distribution of 
the variable, akin to a Fokker-Planck equation. 

In fact, there have been several approaches to representing the gene with a continu- 
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ous variable 



. Kimura 



12j developed the population distribution of a continuous fitness. 



Also, for certain conditions, a Fokker-Planck type equation has been analyzed by Levine 



13|. 



Generalizing these studies provides a systematic derivation of an equation describing the evo- 
lution of the distribution of the phenotypic variable. We adopt selection-mutation models 
describing the molecular biological evolution discussed by Eigen[l^, Kauffmanjl^, and oth- 
ers, and take a continuum limit assuming that the number of bases in the genetic sequence 
is large, and derive the evolution equation systematically in terms of the expansion of 

, originally introduced for the evolution of 
RNA, where the fitness is given as a function of a sequence. Mutation into a sequence is 
formulated by a master equation, which is transformed to a diffusion-like equation. With 
this representation, population dynamics over a large number of species is reduced to one 
simple integro-differential equation with one variable. Although the equation obtained is a 
non-linear equation for the distribution, we can adopt techniques developed in the analysis of 
the (linear) Fokker-Planck equation, such as the eigenfunction expansion and perturbation 
methods. 

So far, we have assumed a fixed, unique mapping from a genotype to a phenotype. 
However, there are phenotypic fluctuations in individuals sharing the same genot ype , which 



has recently been measured quantitatively as a stochastic gene expression [16^ 
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22| . Relevance of such fluctuations to evolution has also been discussed 
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2d, 



26|. 



In this case, mapping from a gene gives the average of the phenotype, but phenotype of 
each individual fluctuates around the average. In the second part of the present paper, 
we introduce this isogenic phenotypic fluctuation into our evolution equation. Indeed, our 
framework of Fokker-Planck type equations is fitted to include such fluctuations, so that one 
can discuss the effect of isogenic phenotypic fluctuations on the evolution. 

The outline of the present paper is as follows: We first establish a sequence model in 
section For deriving the evolution equation from the sequence model, we postulate 

the assumption that the transition probability of phenotype values is uniquely determined 
by the original phenotype value. The assumption may appear too demanding at a first 
sight, but we show that it is not unnatural from the viewpoint of evolutionary biology. In 
fact, most models studied so far satisfy this postulate. With this assumption, we derive a 
Fokker-Planck type equation of phenotypic distribution using the Kramers-Moyal expansion 
method from statistical physics 27, 28|. We discuss the validity of this expansion method to 
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derive the equation, also from a biological point of view. 

As an example of the application of our formulation, we study the Eigen's model in section 
(imp , and estimate the critical mutation rates at which error catastrophe occurs, using a 
singular perturbation method. In section (jV]), we discuss the range of the applicability of 
our method and discuss possible extensions to it. 

Following the formulation and application of the Fokker-Planck type equations for evo- 
lution, we study the effect of isogenic phenotypic fluctuations. While fluctuation in the 
mapping from a genotype to phenotype modifies the fitness function in the equation, our 
formulation itself is applicable. We will also discuss how this fluctuation changes the condi- 
tions for the error catastrophe, by adopting Eigen's model. 

For concluding the paper, we discuss generality of our formulation, and the relevance of 
isogenic phenotypic fluctuation to evolution. 

II. DERIVATION OF EVOLUTION EQUATION 

We consider a population of individuals having a haploid genotype, which is encoded on a 
sequence consisting of sites (consider, for example, DNA or RNA). The gene is represented 
by this symbol sequence, which is assigned from a set of numbers, such as { — 1,1}. This set 
of numbers is denoted by S. By denoting the state value of the ith site by Si (g S), the 
configuration of the sequence is represented by the ordered set s = {si, sn}- 

We assume that a scalar phenotype variable x is assigned for each sequence s. This 
mapping from sequence to phenotype is given as function x{s). Examples of the phenotype 
include the activity of some enzyme (protein), infection rate of bacteria virus, and replication 
rate of RNA. In general, the function x{s) is a degenerate function, i.e., many different 
sequences are mapped onto the same phenotypic value x. 

Each sequence is reproduced with rate A, which is assumed to depend only on the phe- 
notypic value X, as A{x); this assumption may be justified by choosing the phenotypic value 
X to relate to the replication. For example, if a protein concerns with the metabolism of 
a replicating cell, its activity may affect the replication rate of the cell and of the protein 
itself. 

In the replication of the sequence, mutation generally occurs; for simplicity, we consider 
only the substitution of s{i). With a given constant mutation rate fi over all sites in the 
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sequence, the state s[ of the daughter sequence is changed from Sj of the mother sequence, 
where the value s'^ is assigned from the members of the set 5* with an equal probability. 
We call this type of mutation symmetric mutation 29!]. The mutation is represented by 
the transition probability Q{s s'), from the mother s to the daughter sequence s'. The 
probability Q is uniquely determined from the sequence s, the mutation rate fi, and the 
number of members of S. The setup so far is essentially the same as adopted by Eigen et 
al.[l^, where the fitness is given as a function of the RNA sequence or DNA sequence of 
virus. 

Now, we assume that the transition probability depends only on the phenotypic value x, 
i.e., the function Q can be written in terms of a probability function W, which depends only 
on X, W{x — >• x'), as 

J2 Q{s s') = W{x{s) x'). (1) 

s'<^{s'\x'=x{s')} 

This assumption may appear too demanding. However, most models of sequence evolu- 
tion somehow adopt this assumption. For example, in Eigen's model, fitness is given as a 
function of the Hamming distance from a given optimal sequence. By assigning a phenotype 
X as the Hamming distance, the above condition is satisfied (this will be discussed later). 
In Kauffman's NK model, if we set ^ 1, ^ 1, and K/N <^ 1, this assumption is 
also satisfied (see Appendix |Al). For the RNA secondary structure model 



301 ] ■ this assump- 



tion seems to hold approximately, from statistical estimates through 
Some simulations on a cell model with chemical reaction networks 



1 numerical simulations. 



20 



311 ] also support the 



assumption. In factja similar assumption has been made in evolution theory with a gene 



substitution process |32l. |33]. 



The validity of this assumption in experiments has to be confirmed. Consider a selection 
experiment to enhance some function through mutation, such as the evolution of a certain 
protein to enhance its activity [7]. In this case, the assumption means that the activity 
distribution over the mutant proteins is statistically similar as long as they have the same 
activity, even though their mother protein sequences are different. 

With the above setup, we consider the population of these sequences and their dynamics, 
allowing for overlap between generations, by taking a continuous-time model 29|] . We do 
not consider the death rate of the sequence explicitly since its consideration introduces only 
an additional term, as will be shown later. The time-evolution equation of the probability 
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distribution P{s, t) of the sequence s is given by 
dP{s,t) 



dt 



A{t)P{s, t) + Y. A{x{s'))Q{s' s)P{s\ t), (2) 



as specified by Eigenjl^. Here the quantity A{t) is the average fitness of the population at 
time t, defined by A{t) = ^(2;(s))-P(s, t) and Q is the transition probabihty satisfying 
J2s Q{s' ^ s) = 1 for any s'. 

According to the assumption ([1]), eq. ([2]) is transformed into the equation for P{x,t), 
which is the probabihty distribution of the sequences having the phenotypic value x, defined 
by P{x,t) = J2se{s\x=x{s)} Pis,t). The equation is given by 

dP{x,t) _ 
di ~ " 

where the function W satisfies 



'A{t)P{x, t) + J2 Mx')W{x' x)P{x', t), (3) 



^ W{x' x) = 1 for any x', (4) 



as shown. 



Since is sufficiently large, the varia 



Die X is regarded as a continuous variable. By using 



the Kramers- Moyal expansion 27|, |28|, |3J], with the help of property (jl]), we obtain: 



= {A{x) - A{t))P{x, t) + ^ ^—f^m^{x)A{x)P{x, t), (5) 

where m„(x) is the nth moment about the value x, defined by m„(x) = J{x' — x)"'W{x 
x')dx' . 

Let us discuss the conditions for the convergence of expansion ([5]), without mathemat- 
ical rigor. For convergence, it is natural to assume that the function W{x' x) decays 
sufficiently fast as x gets far from x', by the definition of the moment. 

Here, the transition W{x' — x) is a result of n point mutants of the original sequence s' 
for n = 0,1,2, N. Accordingly, we introduce a set of quantities, Wn{x{s') — > x), as the 
fitness distribution of n point mutants of the original sequence s' (Naturally, Wq{x{s') ^ x) = 
6{x{s') —x), which does not contribute to the nth moment m„ (n > 1)). Next, we introduce 
the probability pn that a daughter sequence is an n point mutant (n = 0, 1,2, ...,N) from 
her mother sequence, which are determined only by the mutation rate /i and the sequence 
length A^. Indeed, pjs form a binomial distribution, characterized by /i and A^. 
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In terms of the quantities w„ and Pn, we are able to write down the transition probabihty 
W as 

N 

W{x{s') X) = PnWn{x{s') x) . (6) 
n=0 

Now, we discuss if W{x{s') — > x) decays sufficiently fast with \x{s') —x\. First, we note that 
the width of the domain, in which Wn{x{s') — > x) is not close to zero, increases with n since 
n-point mutants involve increasing number of changes in the phenotype with larger values 
of n. Then, to satisfy the condition for W{x{s') — > x), at least the single-point-mutant 
transition Wi{x{s') — > x) has to decay sufficiently fast with \x{s') — x\. In other words, 
the phenotypic value of a single-point mutant s of the mother sequence s' must not vary 
much from that of the original sequence, i.e., \x{s') — x{s)\ should not be large ("continuity 
condition" ) . 

In general, the domain \x — x{s')\, in which Wn{x{s') x) ^ 0, increases with n. On 
the other hand, the term p„ decreases with n and with the power of yu". Hence, as long as 
the mutation rate is not large, the contribution of Wn to W is expected to decay with n. 
Thus, if the continuity condition with regards to a single-point mutant and a sufficiently low 
mutation rate are satisfied, the requirement on W{x{s') x) should be fulfilled. Hence, 
the convergence of the expansion is expected. 

Following the argument, we further restrict our study to the case with a small mutation 
rate ii such that /iiV -C 1 holds. The transition probability W in eq. (E]) is written as 

W{x{s') ^ x) ~ (1 - ^mN)6{x{s') -x)+ ijNwi{x{s') x), (7) 

where we have used the property that p^'s form the binomial distribution characterized by 
fi and A^. Introducing a new parameter, 7 (7 = fJ'N), that gives the average of the number 
of changed sites at a single-point mutant, and using the transition probability ([7]), we obtain 



^^^""'^^ - - A(t))P(x,t) + 71: ^^|^m«(x)A(x)P(x,t), (8) 



dt ' n\ dx"^ 



where m^\x) is the nth moment of wi{x — > x'), i.e., m^\x) = J{x' — x)^wi{x x')dx' . 

When we stop the expansion at the second order, as is often adopted in statistical physics, 
we obtain 



^^ = (A(a:)-A(t))P(a:,t)+7^ 



(1)/ \ 1 ^ (1)/ \ 
-mi '[x) + 2^"^2 [x) 



A{x)P{x,t). (9) 
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Eqs. (IH]) and ([9]) are basic equations for the evolution of distribution function. Eq. ([9]) is an 
approximation. However, it is often more tractable, with the help of techniques developed for 
solving the Fokker-Planck equation ( see Appendix [B] and [355), while there is no established 
standard method for solving eq. ([H])- 

At the boundary condition we naturally impose that there are no probability flux, which 
is given by 



E 

n=l 



-1 



-m^^\x)A{x)P{x,t) 



0, 



in the case of (IHl) and 



(1)/ \ 1 ^ (1)/ \ 



A{x)P{x,t) 







(10) 



(11) 



X=Xl,X2 



in the case of (Q, where Xi and X2 are the values of the left and right boundaries, respectively. 

Next, as an example of the application of our formula, we derive the evolution equation 
for Eigen's model, and estimate the error threshold, with the help of a singular perturbation 
theory. Through this application, we can see the validity of eq. as an approximation of 
eq. m- 

Two additional remarks: First, introduction of the death of individuals is rather straight- 
forward. By including the death rate D{x) into the evolution equation, the first term 
in eq. (IHl) (or eq. (E])) is replaced by {A{x) - D{x)) - {A{t) - D{t)) P{x,t), where 
D(t) = J D{x)P{x,t)dx. Second, instead of deriving each term in eq. from microscopic 
models, it may be possible to adopt it as a phenomenological equation, with parameters (or 
functions) to be determined heuristically from experiments. 



III. APPLICATION OF ERROR THRESHOLD IN EIGEN MODEL 

In the Eigen modeljl^, the set S of the site state values is given by {—1, 1}, and the 
fitness (replication rate) of the sequence is given as a function of its Hamming distance from 
the target sequence {!,...,!}, i.e., the fitness of an individual sequence is given as a function 
of the number n of the sites of the sequence having value 1. Hence it is appropriate to 
define a phenotypic value x in the Eigen model as a monotonic function of the number n; 
we determine it as x = '^^^'^ , in the range [—1,1]. Accordingly, the replication rate A of 
the sequence can be written as a function of x, i.e., A{x); it is natural to postulate that A 
is a non-negative and bounded function over the whole domain. If the sequence length is 
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sufficiently large, the phenotypic variable x can be regarded as a continuous variable, since 
the step size of x {Ax = j^) approaches as goes to infinity. 

In order to derive the evolution equation of form ([8]) corresponding to the Eigen model, 
we only need to know the function Wi in that model. (Recall that in our formulation the 
mutation rate fi is assumed to be so small that only a single-point mutation is considered.) 
Due to the assumption of the symmetric mutation, this distribution function is obtained 
as Wi(a; — > x — Ax) = wi{x x + Ax) = and wi{x — > x') = for any other 



x'. Accordingly, the nth moment is given by m^^^(x) = ^^(— Ax)" + ^-^^(Ax)". Now, we 



obtain 

dP{x,t) 
di 



{A{x)-A{t))P{x,t)+^Y: 



1 9" 



~^ n\ dx"- 



1 + x 



n ^ 
+ - 



X 



2 



A{x)P{x,t) 
(12) 



where 7 = A^/x, the mutation rate per sequence. When we ignore the moment terms higher 
than the second order, we have 



dP{x,t) 
di 



(A(x)-A(t))P(x,t) + ^^ 



l_d_ 



A{x)P{x,t). 



(13) 



In fact, if we focus on a change near x ~ ( to be specific x ~ 0(l/v A^)), the truncation 
of the expansion up to the second order is validated (Or equivalently, if we define x' = 
(2n — N)/\/N instead of (2n — N)/N, and expand eq.(3) by l/\fN instead of 1/A^, terms 
higher than the second order are negligible, as is also discussed in However, in this case, 
the validity is restricted to x' ~ 0(1) (i.e., (n — A^/2) ~ 0(1)), which means x ~ 0(l/\/iV) 
in the original variable). 

Now, we solve the eq. ( |T3l) with a standard singular perturbation method (see Appendix 
[B|) . and then return to eq. f|T2l) . According to the analysis in Appendix [Bl the stationary 
solution of the equation of form (fT3!) is given by the eigenfunction corresponding to the 

A{x) with 



largest eigenvalue of the linear operator L defined hy L = A{x) + 2'-)£ 
e = jj. Now we consider the eigenvalue problem 



A. 

dx 



X + e 



dx 



A{x)P{x) + 275 



dx 



X + e 



d_ 

dx 



A{x)P{x) = AP(x), 



(14) 



where P(x) > 0, with A to be determined. 

Since e is very small (because A^ is sufficiently large), a singular perturbation method. 



the WKB approximation 



36|, is apphed. Let us put 



P(a;)=e^/.>(^'-')^-', 



(15) 



where xq is some constant and i? is a function of e and x, which is expanded with respect 
to e as 

R{e, x) = Ro{x) + eRi{x) + e'^R2{x) + ... (16) 
Retaining only the zeroth order terms in e in eq. fll4p . we get 



A{x) + 27 xRo{x) + Rl{x) A{x) = A 



(17) 



which is formally solved for Rq as R'^\x 



-x±Wg{x) , , . 

1 where g[x) 



Hence the general solution of eq. (fT4l) up to the zeroth order in e is given by P{x) 

i r R\^\x')dx' n - r Rn~\x')dx' , „ ...11. • 1 

ae^ ^o " + fje ''o " with a and p constants to be determmed. 



P(x 



Now, recall the boundary conditions ( JTTl) : P has to take the two branches in Rq as 

^g^/^f.-^o {^)d-x X < Xh and P{x) = /Je" •^'"6 ^" '•^ ^"^^ for x > Xb, where Xf, is 



defined as the value at which g{x) has the minimum value. Next, from the continuity of P 
at Xb, a = P follows, while from the continuity of ^ at Xb, the function g has to vanish at 



X 



Xb. This requirement g{xb) = determines the value of the unknown parameter A as 



A = A(x6)(l-|xfe2). 



(18) 



From function P, we find that P has its peak at the point x = Xp, where Ro{x) vanishes, 
i.e., at A{xp) = A. Then, P{x) approaches 6{x — Xp) in the limit e —>■ +0. These results are 
consistent with the requirement that the mean replication rate in the steady state be equal 
to the largest eigenvalue of the system (see Appendix [B|) . 

The stationary solution of eq. (fT2|) is obtained by following the same procedure of singular 
perturbation. Consider the eigenvalue problem 



°° 1 (9" 
Aix)Pix)+,^-- 



(2£)" + (_2e)" 



A{x)P{x) = \P{x). (19) 



By putting P(x) = c -f^o^"^^ ^'^^ and taking only the zeroth order terms in e, we obtain 



A(x)+-f 



1+x 



.2Ro{x) 



.\ i- - X / 



-2Ro(x) 



A{x) = A, 



which gives 



2 1 + a; 



with^(a:) = (l + i(^-l))2-(l-a;2). 
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By defining again the value x = Xb aX which g{x) takes the minimum, P is represented as 
P{x) = ae^ ^"^b^^ {x)dx X < Xfy and P{x) = I3e''^'^t>^° {x)dx ^ ^ continuity 
oi ^ at X = Xb requires g{xb) = 0, which determines the value of A as 



A = A{xb) 



7(1- - Xb^) 



(20) 



Again, P{x) = 6{x — Xp), in the limit e +0, with Xp given by the condition A{xp) = A. 
When \xb\ ^ 1, the form (!20|) approaches eq. (fTSll asymptotically. This implies that the 
time evolution equation ([H]), if restricted to |x| ^ 1, is accurately approximated by eq.Q 
that keeps the terms only up to the second moment. 

Let us estimate the threshold mutation rate for error catastrophe. This error threshold 
is defined as the critical mutation rate 7* at which the peak position Xp of the stationary 
distribution drops from Xp ^ to Xp = 0, with an increase of 7. We use the following 
procedure to obtain the critical value 7*. 

First consider an evaluation function whose form corresponds to that of eigenvalue (!20l) 

as 

fix) = Aix) [1 - 7 (1 - v^r^)] , (21) 

and find the position at which the function f{x) takes the maximum value. This pro- 
cedure is equivalent to obtaining Xb in the above analysis, since the relation f{x) = 
A — ^ A{x)(i 7(i+v^r~i? ))^^'^-^ requirement on Xb that g{xb) = and = 



lead to ^ 

ax 



X=Xl, 

0. Obviously, Xb is given as a function of 7, thus, we denote it by 



X=Xl, 

Xbi'y). The position Xb determines the position Xp of the stationary distribution through the 
relation A{xp) = A = f{xb) as in the above analysis. If A has fiat parts around x = and 
higher parts in the region (x > 0), 0:^(7) discontinuously changes from Xp 7^ to Xp = at 
some critical mutation rate 7*, when 7 increases from zero. A schematic illustration of this 
transition is given in Fig.([T]). 

As a simple example of this estimate of error threshold, let us consider the case 

A{x) = l + Aoe{x-xo), (22) 

with y4o > and < xq < 1, and G as the Heaviside step function, defined as 0(x) = 
for X < and G(x) = 1 for x > 0. According to the procedure given above, the critical 
mutation rate is straightforwardly obtained as 7* = / ^° , 1 < 1* , Xp = Xq 

il+Ao)il-^yl-xo'^] 

and for 7 > 7*, Xp = 0. 
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Remark 

An exact transformation from the sequence model (Eigen model|l^) into a class of Ising 



models 



37 



38[ has recently been reported, such that the sequence model is treated ana- 
lytically with methods developed in statistical physics. Rigorous estimation of the error 
threshold for various fitness landscapes 29|, |39(] and relaxation times of species distribution 



have been obtained[40|]. In fact, our estimate (above) agrees with that given by their analysis. 

Their method is indeed powerful when a microscopic model is prescribed in correspon- 
dence with a spin model. However, even if such microscopic model is not given, our for- 
mulation with a Fokker-Planck type equation will be applicable because it only requires 
estimation of moments in the fitness landscape. Alternatively, by giving a phenomenolog- 
ical model describing the fitness without microscopic process, it is possible to derive the 
evolution equation of population distribution. Hence, our formulation has a broad range of 
potential applications. 

IV. CONSIDERATION OF PHENOTYPIC FLUCTUATION 

In this section, we include the fluctuation in the mapping from genetic sequence to the 
phenotype into our formula, and examine how it influences the error catastrophe. We 
first explain the term "phenotypic fluctuation" briefly, and show that in its presence our 
formulation (JHl) remains valid by redefining the function A{x). By applying the formulation, 
we study how the introduction of the phenotypic fluctuation changes the critical mutation 
rate 7* for the error catastrophe. 

In general, even for individuals with identical gene sequences in a fixed environment, the 
phenotypic values are distributed. Some examples are the activities of proteins synthesized 
from the identical DNA j4l|, the shapes of RNA molecules of identical sequences 42 1, and 



20| . Next, the phenotype 



the numbers of specific proteins for isogenic bacteria 3, 3, 
X from each individual with the sequence s is distributed, which is denoted by Pphe{s, x). 

We assume that the form of distribution Pph^ is characterized only in terms of its mean 
value, i.e., the distributions PphJs having the same mean value X take the same form. By 
representing the mean value of the phenotype x by the distribution Pphe is written 

as Pphe{.s,x) = Pphe{x{s),x), whcrc Pphe is a function of x and x, which is normalized with 
respect to x, i.e., satisfying / Pphe{x,x)dx = 1. 
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In our formulation, the replication rate A of the sequence with the phenotypic value x is 
given by a function of phenotypic value x, denoted by A{x). The mean replication rate A 
of the species s is calculated by 

A{x{s)) = J Pphe{x{s),x)A{x)dx. (23) 

As in the case of ([1]), we assume that the transition probability from s to s' during the 
replication is represented only by its mean values x{s) and x{s'), i.e., the transition prob- 
ability function is written as W{x{s) —>■ x{s')). With this setup, the population dynamics 
of the whole sequences is represented in terms of the distribution of the mean value x only, 
so that we can use our formulation ([8]) even when the phenotypic fluctuation is taken into 
account; we need only replace the replication rate A in ([8]) by the mean replication rate A 
obtained from eq. fl2^ . 

Now, we can study the influence of phenotypic fluctuation on the error threshold by 
taking the step fitness function A{x) of eq. f l22i) and including the phenotypic fluctuation as 
given in eq. fl23] ). We consider a simple case where the form of Pphe is given by a constant 
function within a given range (we call this the piecewise flat case). Our aim is to illustrate 
the effect of the phenotypic fluctuation on the error threshold, so we evaluate the critical 
mutation rate 7* using the simpler form f{x) = A{x){l — ^x^) from eq. fllSp . while the use of 
the form ( 12T|) gives the same qualitative result. With this simpler evaluation function, the 
critical mutation rate 7* is given by 

in the case without phenotypic fluctuation. Here we examine if this critical value 7q increases 
under isogenic phenotypic fluctuation. 

We make two further technical assumptions in the following analysis: first we assume 
that Aq in the form (l22l) is sufficiently small, so that the value of critical 7* is not large. 
Second, we extend the range of x to [—00, 00] for simplicity. This does not cause problems 
because we have set the range of xq to (0, 1). Hence, the stationary distribution has its peak 
around the range < x < 1; everywhere outside this range, the distribution vanishes. 

We consider the case in which distribution Pphe of the phenotype of the species s is given 
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by 

for X < X — 



^ ior x-i<x<x + £ (25) 



2£ 

for X + i < X 



where i gives the half- width of the distribution. ((F) represents the piecewise-fiat distribu- 
tion case). Then, A is calculated by 

1 for X < Xo — ^ 



1 + If (x - (xo - i)) for Xo - ^ < X < Xo 
1 + Aq for Xo + ^ < X. 



An example of A^^\x) is shown in Fig. ([2]). The evaluation function / in section fillip is 
given by /(^)(x) = A^^\x){l - fx^). 

We study the case where the position x^'-^-'(= x[^''(7*)) is within the range [xq — i,xo] 
because the profile of A^^^ shows that 7*^^^ is smaller than 7q if x^^^-* > xo- If ^ ^ < xq, 
the position x^*-^-* is within the range [xq— ^, Xo]. In that case, 7**^-^) is given by 7*^^^ ~ ^f (^°_(^^ 
to the first order of Aq. Comparing 7*^^^ with 7q in fl2^ . we conclude that 7*'^-^) < 7^ for 
< £ < ^^Xq, and 7*^-^^ > % for ^^Xo < ^ < Xq. Hence, when the half width £ 
of the distribution Pphe is within the range ( ^"^^^ Xo, Xq), the critical mutation rate for the 
error catastrophe threshold is increased. In other words, the isogenic phenotypic fluctuation 
increases the robustness of high fitness state against mutation. 

We also studied the case in which Pphe{x, x) decreases linearly around its peak, i.e., with 
a triangular form. In this case, the phenotypic fluctuation decreases the critical mutation 
rate as long as Aq is small, while it can increase for sufficiently large values of Aq, for a 
certain range of the values of width of phenotypic fluctuation. 



V. DISCUSSION 



In the present paper, we have presented a general formulation to describe the evolution of 
phenotype distribution. A partial differential equation describing the temporal evolution of 
phenotype distribution is presented with a self-consistently determined growth term. Once a 
microscopic model is provided, each term in this evolution equation is explicitly determined 
so that one can derive the evolution of phenotype distribution straightforwardly. This eq. 
([8]) is obtained as a result of Kramers-Moyal expansion, which includes inflnite order of 
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derivatives. However, this expansion is often summed to a single term in the large number 
limit of base sequences, with the aid of singular perturbation. 

If the value of a phenotype variable |x| is much smaller than unity (which is the maximal 
possible value giving rise to the fittest state), the terms higher than the second order can 
be neglected, so that a Fokker-Planck type equation with a self-consistent growth term is 
derived. The validity of this truncation is confirmed by putting x' = {2n — N) / \/N and 
verifying that the third or higher order moment is negligible compared with the second-order 
moment. Thus the equation up to its second order, ([9]), is relevant to analyzing the initial 
stage of evolution starting from a low-fitness value. 

As a starting point for our formalism, we adopted eq. ([2]), which is called the "coupled" 
mutation-selection equation[43]. Although it is a natural and general choice for studying the 
evolution, a simpler and approximate form may be used if the mutation rate and the selection 
pressure are sufficiently small. This form given by ^^^^'^^ = —A(t)P(s, t) + Y^s'Qi^' — >■ 



s)P{s',t), is called the "parallel" mutation-selection equation |4J]. It approaches the 
coupled mutation-selection equation ([2]), in the limits of small mutation rate and selection 
pressure, as shown in 43(]. If we start from this approximate, parallel mutation-selection 
equation, and follow the procedure presented in this paper, we obtain ^'^g^'*'* = {A(x) — 
A{t))P{x,t)+^£ [-mS')(x) + i^m(^)(x)] P{x,t). 

In general, this equation is more tractable than eq. ([9]), as the techniques developed in 
Fokker-Planck equations are straightforwardly applied as discussed in [35|, and it is also 
useful in describing of evolution. Setting A{x) = x"^ and replacin^mj^^^ and m^2^ with some 
constants, the equation is reduced to that introduced by Kimura(l2l|: while setting A{x) = x, 
m^i\x) oc X, and replacing with some 

Because our formalism is general, these earlier studies are derived by approximating our 
evolution equation suitably. 

Besides the generality, another merit of our formulation lies in its use of the phenotype as a 
variable describing the distribution, rather than the fitness (as adopted by Kimura). Whereas 
the phenotype is an inherent variable directly mapped from the genetic sequence, the fitness 
is a function of the phenotype and environment, and strongly infiuenced by environmental 
conditions. The evaluation of the transition matrix by mutation in eq.([8]) would be more 
complicated if we used the fitness as a variable, due to crucial dependence of fitness values on 
the environmental conditions. In the formalism by phenotype distribution, environmental 
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change is feasible by changing the growth term A{x) accordingly. Our formalism does include 
the fitness-based equation as a special case, by setting A{x) = x. 

Another merit in our formulation is that it easily takes isogenic phenotypic fluctuation 
into account without changing the form of the equation, but only by modifying A{x). By 
applying this equation, we obtained the influence of isogenic phenotype fluctuations on error 
catastrophe. The critical mutation rate for the error catastrophe increases because of the 
fluctuation, in a certain case. This implies that the fluctuation can enhance the robustness 
of a high-fitness state against mutation. 

In fact, the relevance of isogenic phenotypic fluctuations on evolution has been recently 
proposed 23, 24, and change in phenotypic fluctuation through evolution has been ex- 
perimentally verified^, Q]. In general, phenotypic fluctuations and a mutation-selection 
process for artificial evolution have been extensively studied recently. The present formula- 
tion will be useful in analysing such experimental data, as well as in elucidating the relevance 
of phenotypic fluctuations to evolution. 
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Figures 



f, A 

A(x) 




FIG. 1: Examples of profiles of the evaluation function / for three values of 7. The red, 
purple, and blue curves give the profiles of / for 7 = 0.31, 7 = 0.386, and 7 = 0.49, 
respectively, where / is defined by f{x) = A{x){l — 7(1 — Vl — x"^)) and A is given by 
A{x) = 1 + 0.2(x - 0.25)e(x - 0.25)9(0.75 - x) + OAQ{x - 0.75); the profile of A is 
indicated by the black curve. This illustrates determination of and Xp] Xb is given by 
the position where / takes a maximum, while Xp is given as the position where the line 
y = f{xh) crosses the curve of A. For 7 < 0.386, f{x) has a maximum value at x = Xh, 
and thus the critical mutation rate for the error threshold is estimated to be 7* = 0.386. 




FIG. 2: Example of profiles of the mean fitness functions without phenotypic fiuctuation 
case (black); with a constant phenotypic fiuctuation over a given range given by eq. (j25p 
(red), where we set A{x) = 1 + 0.16(x - 0.5) and i = 0.25. 
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APPENDIX A: ESTIMATION OF THE TRANSITION PROBABILITY IN THE 
NK MODEL 



In the NK model 15|, |45| , the fitness / of a sequence s is given by 



1 ^ 

i=l 

where uji is the contribution of the ith site to the fitness, which is a function of Sj and the 
state values of other K sites. The function uJi takes a value chosen uniformly from [0, 1] at 
random. We assume that the phenotype x of the sequence s is given by x = /(s). 

When ^ 1, ^ 1, and K/N <^ 1, the phenotype distribution of mutants of a given 
sequence s (whose phenotype is x) is characterized only by the phenotype x (without the 
need to specify the sequence s). For showing this, we first examine the one-point mutant 
case. 

We consider the "number of changed sites" of sites at which u's are changed due to a 
single-point mutation. By assuming that the average number of changed sites is K, the 
distribution of the number of changed sites n, denoted by Psitein), is approximately given 
by 

Psitein) ~e (Al) 

with the help of the limiting form of binomial distribution. Here, we have omitted the 
normalization constant. 

Next, we study the distribution of the difference between the phenotype x of the original 
sequence and the phenotype x' of its one-point mutant, given the number n of changed sites 
of the single-point mutant. We denote the distribution as Pdiff{n; X), where X = x' — x. 
Here the average of x' is x{N — n)/N, since (A^ — n) sites are unchanged. Thus, according 
to the central limit theorem, the distribution is estimated as 

(X + ^xy 



Pdiff{n;X) ~ exp 



9n — 



(A2) 



where is the variance of the distribution of the value of u. This variance is estimated 
from the probability distribution P(s,{a;j)(ci;) that the sequence u is generated.. Although 
the explicit form of P(s,{uji}) is hard to obtain unless {ooi} and s are given, it is estimated by 
means of the "most probable distribution," obtained as follows: Find the distribution that 
maximizes the evaluation function 5* (called "entropy") defined by S* = — P{uj) \ogP{uj)(ko 
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under the conditions Jq P{uj)duj = 1 and Jq ujP{uj)duj = x. Accordingly the variance cr^ may 
depend on x. 

Combining these distributions (lAip and (lA2p gives the distribution of X without con- 
straint on the number of changed sites: 

N 

P{X) = Ps^te{n)Pd^ff{n; X) ~ exp 

n=l 

This result indicates that the phenotype distribution of single-point mutants from the orig- 
inal sequence s having the phenotype x is characterized by its phenotype x only; s is not 
necessary. Similarly, one can show that phenotype distribution of n-point mutants is also 
characterized only by x. Hence, the transition probability in the NK model is described only 
in terms of the pheno types of the sequences, when ^ 1, K ^ 1, and K/N <^ 1. 



AT" 



APPENDIX B: MATHEMATICAL STRUCTURE OF THE EQUATION OF 
FORM dl]) 



We first rewrite eq. ([9]) as 

dP{x,t) 
di 



-A{t)P{x,t) + L{x)P{x,t), (Bl) 



where L is a linear operator, defined by L{x) = A{x) + -§^f{x) + -§^g{x) with f{x) = 
—'-fmi'\x)A{x) and g{x) = ^■m'2\x)A{x). 

The linear operator L is transformed to an Hermite operator using variable transforma- 
tions (see below) so that L is represented by a complete set of eigenf unctions and corre- 
sponding eigenvalues, which are denoted by {0j(x)} and {Aj} (i = 0,1,2,...), respectively. 
Eigenvalues are real and not degenerated, so that they are arranged as Aq > Ai > A2 > .... 

According to [35i], P{x,t) is expanded as 



P(a;,t) = ^a,(t)0,(x), (B2) 

i=0 



where Oj satisfies 

-5^ = a.(t)(A.-E«.WA,). (B3) 

The prime over the sum symbol indicates that the summation is taken except for those of 
noncontributing eigenfunctions as defined in 
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Stationary solutions of eq. (1B3P are given by {ak = 1 and = for i ^ k}. Among 
these stationary solutions, only the solution {ao = 1 and = for z 7^ 0} is stable. Hence, 
the eigenfunction for the largest eigenvalue (the largest replication rate) gives the stationary 
distribution function. Now it is important to obtain eigenf unctions and eigenvalues of L, in 
particular the largest eigenvalue Aq and its corresponding eigenfunction 0o- Hence, we focus 
our attention on the eigenvalue problem 



P{x) = XP{x) 



(B4) 



where A is a constant and P is a function of x. 

We can transform eigenvalue problem (1B4I) to a Schroedinger equation-type eigenvalue 



problem as follows: First we introduce a new variable y related to x as y{x) = J^^^ y g(x') 
where xq and h are constants. Next, we consider a new function related to P{x) as 



f \ doc 



P(x] 



x=x(y) 



where is some constant, x{y) the inverse function of y{x), and / a function of y defined 
by 

f{y) - 



2 dx 



x=x{y) 



Using these new quantities y and \1/ and rewriting eigenvalue problem (lB4p suitably, we 



get 



d}(v) 

where V{y) = A{y) + 
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(B5) 



4h 



with A{y) = A{x{y)). 
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